As I have finished up my PhD and still on the industry job market I have found myself doing a lot of projects to satisfy some curiosity and introduce myself professionally outside of my work in political science. I love watching football and learning about football with most of my working time spent listening to various NFL podcasts and the Learning Bayesian Stats podcast. For the last few years I have had an inherent fascination with Bayesian stats. One of the great things about Bayesian stats is that we can talk about uncertainty in a more intuitive way and because of the mechanics of using Bayesian models we can get some awesome looking plots.
I got interested in Alex Andora and Maximilian Göbel’s Soccer Factor Model.1 They extend a common model in asset pricing called the factor model where we adjust for macroeconomic factors to elicit whether a particular portfolio manager is skilled at picking stocks or if this could be explained by the economy improving or worsening. They extend this idea to soccer where we are adjusting for macro-factors that affect the team. Whatever remains is attributable to a player’s individual skill. They focus on goal scoring as an observable feature of a player’s latent skill.
After reading through the example notebooks and the paper I thought that this was not only an interesting idea, but probably would have a pretty strong cross-over with touchdowns in football. If we take a quick peak at the data they look kind of like each other. The data that are used in the goals plot are a subset of the data that they use in their Sloan analytics paper. From my very limited knowledge of ball most of these guys seem pretty good so we are likely to see more players with one goal.
Why touchdowns? Outside of being a fun excercise to see how a model designed for soccer translates to football I think there is at least a resonable football story for why we can use touchdowns as an outcome. For one a touchdown in Fantasy football acts as mouthwash for fantasy scoring. I am in a Half Points per Reception league meaning that a reception is worth 0.5 points, each receiving yard is worth 0.1 points, and a receiving touchdown is worth six points. So for a league average receiving performance with no touchdowns this is worth 5.74 fantasy points. A touchdown, or two, turns a baddish fantasy football performance into a relatively good one.
Outside of my pretend team that I manage TDs could also serve as a way to tap a pass catcher’s latent ability. To be a good pass catcher in the NFL you have to combine being a good route runner, athleticism, and ability to the call. You could argue that being a good pass catcher becomes more difficult when a team gets closer to the endzone. If we use some real player tracking data provided by the NFL we can see some of the difficulties of being a receiver in the endzone. If we look at the person that actually catches the touchdown there are three defenders in the area if we count the corner playing Tyreek Hill. If we turn our attention to the outlet pass (number 35 in red) there are three defenders in the area when he slips out to make himself available.
Undoubtedly the probability of scoring increases as the offense gets closer to the endzone, but you have less room to get open. There is a pretty good case that as we start to shrink the field you have to be a more crisp route runner and a good pass catcher since space is more limited. During a scramble drill you have to have a feel for where the defense is and your QB’s arm strength. If your QB is slightly off you are likely going to have to make a contested catch because everybody is a lot of closer. One small caveat is that I don’t understand all the nuances of play design and designing an offense, but you could imagine that it matters definitely matters. That being said as a play designer you need to think about how to keep defenders where you want them. In the play above Charcandrick West (number 35) likely has two duties. I would imagine that he is the answer if nobody is open and he is likely tasked with ocupying the linebackers so they do not sink into coverage closing the window for number 84.
Even with an explosive play more or less at the edge of the redzone space is still at premium. Lets look at this touchdown pass to Tyreek Hill where his skill as a receiver is on display. If we look at the highlight of this play the corner Sam Shields tries to disrupt the route by jamming Hill at the line. Hill is avoids the jam and uses his speed to outrun Shields and catches the ball even with Shields right in his face. This is to say even though there is a higher probability of scoring your skills as a receiver are extenuated because of the tight quarters.
Obviously we measure wide receiver production in a lot of different ways some of the most obvious alternative measurements are efficiency metrics like yards per route run, usage statistics like target share or targets per route run, or just modeling production whether this is receiving yards or yards after catch. In fairness to the nflreadr team they do have this data. Another potential alternative is trying to estimate separation score as a way to measure how they are developing as a route runner. You could totally model this data using offensive personnel as one of the groups then model your yards metric of choice. However, in my wildest dreams I would like to use this model throughout the season to inform fantasy football decisions. The participation data that is provided is fantastic, but you have to wait till the end of the season. This is likely going to be a future excercise for me.
To fit the model I want we are going to have to levarage information we have prior to the game. Mainly some measure of the receiver’s passing offense, how good the defense they are playing is, how fresh the player is, some measure of form, their aging curve, weather, and what kind of game we think it is going to be. In essence we are adjusting for factors that are going to effect the receiver and the probability of touchdowns. The covariates I use in the model are:
A difference between the defensive team’s passing epa and the pass catcher’s team epa.
The rest differential between the receiver’s team and the opposition.
Air yards per pass
Weather: Mainly wind and temperature.
Total line: a combination of both team’s projected points according to Vegas
Four binary indicators: Whether the receiver is playing: a home game, inside, a division game, or if it is post 2018
I use the total line but you could use the whether the pass catcher’s team is favored to win or vice-versa. I use total line as a proxy for what kind of game Vegas thinks it is going to be. You could also use who Vegas thinks the favorite is, but I include total line as a way to tap the opposing team’s offensive potential. Effectively I am trying to tap what we think the game script is going to be going into the game. If we think it is going to be a high scoring game then this force one or both teams to rely on a more run heavy script to keep the opposing offense off the field. I include the difference between the defensive team’s passing EPA per game and the pass catcher team’s passing EPA per game. Effectively I am trying to adjust for how much better the opposing team’s defense is playing going into the matchup. I also include weather and surface as potential confounders. If it is windy and rainy and it is outdoors we are probably not going to see a ton of passes because the ball is harder to throw and catch. I also include whether it is a division game to capture a team’s familiararity with each other.
The post 2018 indicator probably deserves a little more exposition. In 2018 the NFL introduces a series of new rules, in part, to promote passing. The big change was a revision to the catch rules to try to eliminate some notably controversial calls. A catch happens when a receiver establishes themselves in bounds and perform a “football move.” Additionally, the ball is allowed to move as long as it is in the receiver’s control. In the clip below Brandon Aiyuk makes a great catch where the ball moves during the play. Prior to 2018 this likely would have been ruled an incomplete catch and the 49ers would have likely kicked a field goal.
To model time I make use of Hilbert Space Gaussian Processes(HSGP). Most of the textbook definitions of a Gaussian Process(GP) start with the idea that this is a wholly uninformative name. Effectively a Gaussian process is a collection of random variables where any finite subset have a gaussion distribution. It is effecitvely just an infinite vector a.k.a a function where we are going to place a prior over. Generally Gaussian processes are used to model time or space or both. Mathematically this involves a lot of matrick inversion to get the posterior covariance. What this means practically is that the execution is \(O(n^3)\) to get a sense of what that means I plotted how long it would take to fit a single Gaussian process. Game level NFL data is not neccessarily all that big but there are about 2080 games in the nflreadr database without including the play by play data where we are including data from just about every wide receiver to take a snap. To get around having to wait 30+ hours to fit a model we can use a lower level approximation of GP known as a HSGP. We are using an approximation of a GP where we use basis to capture the wiggliness of the function while basically converting everthing from a matrix inversion to matrix multiplication which is a much faster operation.
We are interested in modeling two different time components that don’t have an obvious functional form. The first is modeling how well a player is playing in a particular season. They could be having an awesome season and that is carrying over from game to game. More critically we are interested in how experience impacts ability. In the most optimistic case you get a 21 year old rookie into your building and in year one they are at or above league, but have some maturing to do with the finer aspects of being a pass catcher. By the time they get to their second contract they may not be as fast as they were coming out of college, but they are an overall better pass catcher. Then towards the end of the career they dip back to where they were as rookie because they have taken a step back athletcially.
This is a linearish story of receiver ability and a player’s ability in generally is one that fanbases, GMs, and coaches would love if that was the case but it rarely ever happens. Tight End has a big jump from college to the NFL for a variety of reasons. George Kittle is a great example of the diversity of responsibilities that an elite tight end has in the NFL. Part of what makes him elite is that he is an awesome blocker that can be used at the point of attack. Sometimes this includes blocking a team’s best edge rusher which is a difficult task for elite tackles never mind a Tight End. To alleviate some of the difficulty Shanahan uses a lot of motion to try and create advantageous angles and headstarts. The rub is that how the motion and blocking looks on a run play should look the same as when he is used on play action. As you can imagine this is pretty difficult especially when you are just getting used to the size and speed of an NFL defender and the complexity of the NFL.
Travis Kelce is another great example of the difficulty of being a pass catcher in the NFL. Over the years Kelce has built a big reputation for his improvisation in route running.2 A lot of the plays that get dialed up for him are choice routes where he can make a decision on what route to run based on the coverage. You can run what is known as “pause and bounce” where the pass catcher “misses the count” where you are deliberately a tick slow. To combat under center play action defenses will change the picture after the snap or switch coverage. By delaying your route you can get more information about the coverage to run your route. As you can imagine this takes a lot of preparation and experience to execute. This maturation process is likely not linear and is not going to have the same effect on every player.
The Fun Stuff: Modeling the Data
I fit an Ordered logit for each player for each player i in game g within each season s. The rough sketch of the model takes this form. For a more detailed look at the data collection, data cleaning, and modeling files I will point you towards the files in the script folder. The sandbox folder is really a way for me to play around the various aspects of tuning the model. \[
\begin{aligned}
\ell_{experience},\ell_{form} \sim InverseGamma(\alpha, \beta) \\
\sigma_{experience}, \sigma_{form} \sim Exponential(\lambda) \\
\beta_{factor} \sim \mathcal{N}(\mu_{factors}, \sigma_{factor}^{2}), k = 1, \ldots, p \\
\sigma_{player} \sim Exponential(1) \\
\sigma_{baseline} \sim \sqrt{\sigma^2_{player} + \frac{\sigma^2_{cutpoints}}{J}} \\
\beta_{0} ~ \mathcal{N}(0, \sigma^2_{baseline}) \\
\alpha_j = \beta_{0} + alpha_{j}^{raw}, where \sum^j_{j=1}\alpha^{raw}_i=0, \alpha^[raw]_{j} \sim \mathcal{N}(0, \sigma^2_{i}) \\
f_{experience}(s) \sim \mathcal{GP}(0, \sigma^2_{experience} \cdot K_{Matérn}(\cdot, \cdot;\ell_{experience})) \\
f_{performance}(g) \sim \mathcal{GP}(0, \sigma^2_{performance} \cdot K_{Matérn}(\cdot, \cdot;\ell_{performance})) \\
N_i = \alpha_i + f_{experience}(s_i) + f_{performance}(g_i) + \mathcal{X}^{\top}_{i}\beta \\
Touchdowns_{i} \sim \text{Ordered Logit}(N_{i}, \mathcal{c}_{i})
\end{aligned}
\]
The majority of this section will focus on the prior predictive simulations that were neccessary to tune the model and then extracting the posterior predicitions and plotting them. Because of the structure of the model this takes a fair bit of time to fit.
An ordered categorical seems kind of like a weird fit since we are really just using counts. However, we don’t have a ton mass in the 3+ touchdown range. Even in the 2+ touchdown range we are working with even less mass then the goal scoring data that Max and Alex are using. I would imagine if they included attacking midfielders than the counts would look a little more similar. We could use what I like to call “you must be this tall to ride the ride” approach meaning we could throw out any pass catcher without enough games played or enough targets. However, we may be getting rid of some intersting compartive information when we want to go and calculate replacement level stats.
Additionally, while there is no technical upperbound to the number of touchdowns you could score in a game or a season there are some practical bounds on the total number of touchdowns. The current single season record is held by Randy Moss with 23 a record that is 18 years old which broke Jerry Rice’s single season record of 22 which was 20. The current single game record is a three way tie between Kellen Winslow, Bob Shaw, and Jerry Rice with each player having 5 receiving touchdowns in a single game. No receiver since Jerry Rice in 1990 has had 5 receiving touchdowns in a game.3
hist_dat = empircal_dat |>mutate(binary =ifelse(rec_tds >=1, 1, 0)) |>pivot_longer(c(rec_tds, rec_tds_game, binary)) |>mutate(name =case_when(name =='rec_tds_game'~'Observed TDs', name =='rec_tds'~'Lumped TDs', name =='binary'~'Indicator TDs'))ggplot(hist_dat, aes(x = value, fill = name)) +geom_bar(alpha =0.5, position ='dodge') +labs(x ='Touchdowns', y ='Count', fill =NULL) + MetBrewer::scale_fill_met_d(name ='Lakota') +scale_y_continuous(labels = scales::comma)
Instead of using the full observed range I am just going to lump together 3 touchdowns and 4 touchdowns together. Functionally nothing really changes because 3 touchdowns is still a relatively rare occurrence. Even when we create a simple binary indicator we are not really changing things to much. I decided to use an ordered logit because a two or three touchdown game is still useful for understanding how much better a pass catcher is than league average. Generally, the great pass patchers have lots of multiple touchdown games. There are some games where a semi-random pass catcher may have a multiple TD game, but these are few and far between. Kyle Juszcyk has been an excellent receiving fullback in his career. However, he is not neccessarily a major scoring threat with only one game where he has scored multiple receiving touchdowns.
The biggest difference that I found when changing the goal scoring model to the touchdown scoring model was dealing with time. The soccer season is considerably longer than the football schedule with 38 matchdays while the length of the football season ranges from 14-17 games over the population of NFL games. Fitting two GPs into one season is feasible but a little bit overkill. Careers in the NFL also tend to be a bit shorter than in European high-level soccer. My guess is that there are lot more opportunities to play professional soccer in Europe. Partially, you do not have a salary cap in European soccer so you don’t have to worry about salary management. You can also loan players to competiting teams to give them playing time. This amounts to a longer average career length for a soccer player. This buys us a little bit more information, but adds more complex time dynamics.
Footnotes
I am American so I will just call it Soccer and call American Football Football.↩︎